Project-Team:STARS

Inria | Raweb 2017 | Presentation of the Project-Team STARS | STARS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Recognizing Human Actions Using RGB Sport Videos From the Web

Participants : Amir Nazemi, François Brémond.

keywords: Action Recognition, Activity Recognition, Video Summarization, Web Sport Videos, Golf Videos.

The aim of this work is to extract sport actions from a web sport streaming video and use them for highlight detection. The sport videos which is used in this research is Golf videos. The report explains 4 steps including the data preparation, methods selection and excremental results.

Figure 23. The output of human poses detection on one frame of Golf video dataset

Data Preparation

**Table 8.** The Golf dataset.
Class names	Number of samples
Tee shot + Geometrical Features	73
Putt	70
standing	81

**Table 9.** The experimental results of performing two different methods on the golf dataset.
Methods	Accuracy on Golf Dataset
LSTM + Geometrical Features	91.5 %
P-CNN	97.32 %

First, from a streaming video a dataset is built. This dataset contains 3 action classes such as Tee-shot, Putt and Standing. Table 8 shows the dataset description.

Framework

After preparing the dataset next step is to define the solutions for the problem. Since one of the main goal of this research is to provide a general solution for sport video then we proposed a solution based on the skeleton or human poses. Our proposed framework contains human pose detection, human tracking and action recognition respectively. For human pose detection we used a recent method named open-pose [105]. For human pose tracking we used a tracking method of Inria STARS SUP framework. Finally for action recognition we did some experiments for choosing the best method.

Methods selection

From different methods in the field of action recognition we selected the P-CNN [55] method which is the state of the art on some data-set. Additionally for having an alternative solution which is faster than P-CNN we proposed a method based on geometrical features of human poses. We used the geometrical features in a Long Short-Term Memory (LSTM) structure to characterize the second solution.

Experimental Results

Table 9 shows the results of selected methods on the prepared golf dataset. As it is illustrated in the table 9 the P-CNN method works better than a method with LSTM and geometrical features.

Previous |

Home | Next next